arXivil 504.05994V 1 [stat.ME] 22 Apr 2015 


1 


On the relation between Gaussian process 
quadratures and sigma-point methods 

Simo Sarkka, Jouni Hartikainen, Lennart Svensson, and Fredrik Sandblom 


Abstract —This article is concerned with Gaussian process 
quadratures, which are numerical integration methods based on 
Gaussian process regression methods, and sigma-point methods, 
which are used in advanced non-linear Kalman filtering and 
smoothing algorithms. We show that many sigma-point methods 
can he interpreted as Gaussian quadrature based methods 
with suitably selected covariance functions. We show that this 
interpretation also extends to more general multivariate Gauss- 
Hermite integration methods and related spherical cubature 
rules. Additionally, we discuss different criteria for selecting the 
sigma-point locations: exactness for multivariate polynomials up 
to a given order, minimum average error, and quasi-random 
point sets. The performance of the different methods is tested in 
numerical experiments. 


I. Introduction 

Gaussian process quadratures 111-0 are methods to numer¬ 
ically compute integrals of the form 

2^[g] = yg(x)w(x)dx, (1) 

where g : M" i—K"* is a (non-linear) integrand function and 
w(x) is a given, typically positive, weight function such that 
/ w{'x) dx < oo. In Gaussian process quadratures the function 
g(x) is approximated with a Gaussian process regressor 0 
and the integral is approximated with that of the Gaussian 
process regressor. 

Sigma-point methods 0-0 can be seen GZ) as methods 
which approximate the above integrals via 

yg(x)w(x) dx«^Wig(xi), (2) 

where Wi are some predehned weights and x^ are the sigma- 
points (classically called abscissas). Typically the evaluation 
points and weights are selected such that when g is a multi¬ 
variate polynomial up to a certain order, the approximation is 
exact. 

A particularly useful class of methods is obtained when 
the weight function is selected to be a multivariate Gaussian 
density u>(x) = N(x | m,P). In the context of Gaussian 
process quadratures it then turns out that the integral of the 
Gaussian process regressor can be computed in closed form 
provided that the covariance function of the process is chosen 
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to be a squared exponential 0 (i.e., exponentiated 
quadratic). This kind of quadrature methods are also often 
referred to as Bayesian or Bayes-Hermite quadratures. They 
are closely related to Gauss-Hermite quadratures in the sense 
that as Gaussian quadratures can be seen to form a polynomial 
approximation to the integrand via point-evaluations, Gaussian 
process quadratures use a Gaussian process regression approx¬ 
imation instead 0-|3|. Because Gaussian process regressors 
can be used to approximate much larger class of functions 
than polynomial approximations 0, they can be expected to 
perform much better also in numerical integration. 

The selection of a Gaussian weight function is also partic¬ 
ularly useful in non-linear hltering and smoothing, because 
the equations of non-linear Gaussian (Kalman) biters and 
smoothers 0, 0 -||22| consist of Gaussian integrals of the 
above form and linear operations on vectors and matrices. 
The selection of different weights and sigma-points leads to 
different brands of approximate biters and smoothers 0 - 
For example, the multidimensional Gaussian type of Gauss- 
Hermite quadrature and cubature based biters and smoothers 
|21|-|25| are based on explicit numerical integration of the 
Gaussian integrals. The unscented transform based methods 
as well as other sigma-point methods 0-0 can also be 
rebospectively interpreted to belong to the class of Gaussian 
numerical integration based methods p^ . Conversely, Gaus¬ 
sian type of quadrature or cubature based methods can also 
be interpreted to be special cases of sigma-point methods. 
Furthermore, the classical Taylor series based methods |26| 
and Stirling’s interpolation based methods 0, 0 can be 
seen as ways to approximate the integrand such that the 
Gaussian integral becomes tractable (cf. 0). The recent 
Fourier-Hermite series |28|, Hermite polynomial p9) methods 


are also based on numerical approximation of the integrands. 

The aim of this article is to present new Gaussian process 
quadrature based methods for non-linear bitering and smooth¬ 
ing, and to analyze their connection with sigma-point methods 
and multivariate numerical integration methods. We show 
that many sigma-point bitering and smoothing algorithms 
such as unscented Kalman biters and smoothers, cubature 
Kalman biters and smoothers, and Gauss-Hermite Kalman 
biters and smoothers can be seen as special cases of the 
proposed methods with suitably chosen covariance functions. 
More generally, we show that many classical multivariate 
Gaussian quadrature methods, including Gauss-Hermite rules 
1301, and symmetric integration formulas 0 are special cases 
of the present methodology. We also discuss different criteria 
for selecting the sigma-point (abscissa) locations; exactness 
for multivariate polynomials up to a given order, minimum 
average error, and quasi-random point sets. 
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This article is an extended version of the conference ar¬ 
ticle Q, where we analyzed the use of Gaussian process 
quadratures in non-linear filtering and smoothing as well 
as their connection to the unscented transform and Gauss- 
Hermite quadratures. In this article, we deepen and sharpen 
the analysis of those connections and extend our analysis to a 
more general class of spherically symmetric integration rules. 
We also analyze different sigma-point selection schemes as 
well as provide more extensive set of numerical experiments. 


II. Background 


A. Non-Linear Gaussian (Kalman) Filtering and Smoothing 

Non-linear Gaussian (Kalman) filters and smoothers in)’ 
-|[2^ are methods which can be used to approximate 
the filtering distributions p(xfc | yi,... ,yk) and smoothing 
distributions p(xfe | yi,...,yT) of non-linear state-space 
models of the form 


Xfc = f(xfe_i) -f qfc_i, 
yk = h(xfc) -f Tk, 


(3) 


where, for k = 1,2,... ,T, x^ G M" are the hidden states, 
yk G are the measurements, and q^-i ^ N(0, Qfe_i) 
and Tfc ~ N(0, R^) are the process and measurements noises, 
respectively. The non-linear function f( ) is used to model the 
dynamics of the system and h( ) models the mapping from 
the states to the measurements. 

Non-linear Gaussian filters (see, e.g., GZl’ page 98) are 
general methods to produce Gaussian approximations to the 
filtering distributions: 


p(xfc I yi, ■ ■ ■ ,yfc) « N(xfe I mfc,Pfc), k=l,2,...,T. 

(4) 

Non-linear Gaussian smoothers (see, e.g., GZl’ page 154) are 
the corresponding methods to produce approximations to the 
smoothing distributions: 


p(xfc I yi,..., yr) « N(xfe | P^), fc = 1, 2,..., T. 

(5) 

Both Gaussian filters and smoothers above can be easily 
generalized to state-space models with non-additive noises (see 
113), but here we only consider the additive noise case. 

A general additive noise one-step moment-matching-based 
Gaussian filter algorithm can be written in the following form. 

Algorithm II.1 (Non-Linear Gaussian filter). The prediction 
and update steps of the non-linear Gaussian (Kalman) filter 
are 

• Prediction: 

= y f(xfc_i) N(xfc_i I nik-i,Pk-i) Axk-i 

= J (f(xfc-i) - ) (f (xfe_i) - m^: y 

X N(xfc_i I mfc_i,Pfc_i) dxfe_i -f Qfc_i. 

( 6 ) 


• Update: 

t^k = J h(xfc) N(xfe I m^,P^) dxfc 

Sfc = y (h(xfc) - fi^,) (h(xfc) - 

X N(xfe I m“,P“) dxfe -l-Rfe 
Ck = y (xfe - m~) (h(xfe) - (7) 

X N(xfe I m~,P“) dxfe 

Kk = Cfc Sfc 1 

nifc = m“ -f Kk [yk - f-k) 

P/c = Pfc - Kfe Sfe Kj. 

The filtering is started from initial mean and covariance, mo 
and Pq, respectively, such that Xq ~ N(mo,Po). Then the 
prediction and update steps are applied for k = 1,2,2),... ,T. 

The result of the filter is a sequence of approximations 

P(xfc I yi,...,yfc) « N(xfe | mk,Pk), k = l,2,...,T. 

( 8 ) 

The corresponding smoothing algorithm can be written in the 
following form. 

Algorithm II.2 (Non-Linear Gaussian RTS smoother). The 
equations of the non-linear Gaussian (Rauch—Tung—Striebel, 
RTS) smoother are the following m & 

nifc+i = j f{xk)N{xk I mfe,Pfc) dx^ 

Pfc+i = y[f(xfc) - [f(xfc) - 

X N(xfc I mfe,Pfc) dxfc -f Qfc 

Dfc+i = y[xfc - mfc] [f(xfc) - (9) 

X N(xfe I mfe,Pfc) dxfc 
Gfc = Dfe+i [P-^J-I 
mfc = mfe -f Gk 

P^=Pfe + Gfc (P|+i-P^+i)Gl. 

The smoothing recursion is started from the filtering result of 
the last time step k = T, that is, = m^, P|. = Pp cind 
proceeded backwards for k = T — 1,T — 2,... ,1. 

The approximations produced by the smoother are 

p(xfc I yi,...,yT) «N(xfc | m^,P^.), fc = 1,2, ...,T. 

( 10 ) 

Both the filter and smoother above can be derived from the 
following Gaussian moment matching ’’transform” | [T7| (the 
terminology comes from unscented transform). 

Algorithm II.3 (Gaussian moment matching of an additive 
transform). The moment matching based Gaussian approx¬ 
imation to the joint distribution of x and the transformed 
random variable y = g(x) -f q, where x ~ N(m, P) and 
q ~ N(0, Q), is given by 



( 11 ) 
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where 

tJ'M = J g(x) N(x I m,P) dx, 

Sm = y (g(x) - Mm) (g(x) - Mm)"^ N(x I m, P) dx + Q, 
Cm = Ax - m) (g(x) - Mm)^ N(x | m, P) dx. 

( 12 ) 


B. Gaussian Integration and Sigma-Point Methods 

Sigma-point filtering and smoothing methods can generally 
be described as methods which approximate the Gaussian 
integrals in the Gaussian filtering and smoothing equations 
(and in the Gaussian moment matching transform) as 

y g(x) N(x I m,P) dx « ^ Wig(x,), (13) 


where Wi are some predefined weights and x^ are the sigma- 
points. Typically, the sigma-point methods use so called 
stochastic decoupling which refers to the idea that we do a 
change of variables 


g(x) N(x I m, P) dx = y g(m -f- N(^ | 0,1) d| 

(14) 


i(€) 


where P = s/P VP ■ This implies that we only need to 
design weights Wi and unit sigma-points for integrating 
against unit Gaussian distributions; 

/'g(ON(||0,I)d4«^H^,g(|J, (15) 

i 

thus leading to approximations of the form 

y g(x) N(x I m, P) dx « ^ Wi g(m -f VP |i). (16) 


Different sigma-point methods correspond to different 
choices of weights Wi and unit sigma-points ^i. For example, 
the canonical unscented transform Q uses the following set 
of 2n -h 1 weights (recall that n is the dimensionality of the 
state) and sigma-points: 


Md) = —= f = l,...,2n, 

n-\- K 2[n K) 

r 0, i = 0, 

= Wtt + KGi, i = l,...,n, 

\—\/n -\- nSi-n, i = n-\-1,... ,2n. 


(17) 


where k is a design parameter in the algorithm and Oi S K" 
is the unit vector towards the direction of the fth coordinate 
axis. 

Note that sigma-point methods sometimes use different 
weights for the integrals appearing in the mean and covariance 
computations of Gaussian filters and smoothers. However, 
here we will only concentrate on the methods which use 
the same weights for both in order to derive more direct 
connections between the methods. For example, the above 
unscented transform weights are just a special case of more 
general unscented transforms (see, e.g., ini). 


C. Gaussian Process Regression 

Gaussian process quadrature Q, is based on forming 
a Gaussian process (GP) regression 0 approximation to the 
integrand using pointwise evaluations and then integrating the 
approximation. In GP regression 0 the purpose is to predict 
the value of an unknown function 


o = p(x) 


( 18 ) 


at a certain test point (o*,x*) based on a finite number of 
training samples V = {(oj,Xj) : j = l,...,iV} observed 
from it. The difference to classical regression is that instead 
of postulating a parametric regression function 5 e(x; 9), where 
6 G are the parameters, in GP regression we put a Gaus¬ 
sian process prior with a given covariance function IC(x,x') 
on the unknown functions gx(x). 

In practice, the observations are often assumed to contain 
noise and hence a typical model setting is: 

gx ^ GP(0,K(x,x')) 

Oj = gxixj) + Cj, €j ~ N(0,cr^), 

where the first line above means that the random function 
gx has a zero mean Gaussian process prior with the given 
covariance function IC(x,x'). A commonly used covariance 
function is the exponentiated quadratic (also called squared 
exponential) covariance function 

iT(x,x') = exp ^-^^llx-x'll^^ , (20) 


where s,£ > 0 are parameters of the covariance function (see 

0 ). 

The GP regression equations can be derived as follows. 
Assume that we want to estimate the value of the noise-free 
function g(x) based on its Gaussian process approximation 
gx(x) at a test point x given the vector of observed values 
o = (oi,..., ojv)- Due to the Gaussian process assumption 
we now get 




/K + a^I k(x) \\ 

kT(x) K(x,x)JJ 


( 21 ) 


where K = [K{xi,Xj)] is the joint covariance of observed 
points, K{x,x) is the (co)variance of the test point, k(x) = 
[iT(x, Xi)] is the vector cross covariances with the test point. 

The Bayesian estimate of the unknown value of gx{^) 
is now given by its posterior mean, given the training data. 
Because everything is Gaussian, the posterior distribution is 
Gaussian and hence described by the posterior mean and 
(auto)covariance functions; 

E[gx{^) I o] = k'^(x) (K -k o 

Cov[pif(x) I o] = K{x,x') - k'^(x) (K -h cr^I)^^k(x'). 

( 22 ) 


These are the Gaussian process regression equations in their 
typical form 0, in the special case where g is scalar. 
The extension to multiple output dimensions is conceptually 
straightforward (see, e.g., |[7|, |[32|), but construction of the 
covariance functions as well as the practical computational 
methods tend to be complicated p^ , 0. However, a typical 
easy approach to the multivariate case is to treat each of the 
dimensions independently. 
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D. Gaussian Process Quadrature 

In Gaussian process quadrature 0,0 the basic idea is to 
approximate the integral of a given function g against a weight 
function w(x), that is, 

= J 5 (x) w(x) dx, (23) 

by evaluating the function p at a hnite number of points and 
then by forming a Gaussian process approximation gx to the 
function. The integral is then approximated by integrating the 
Gaussian process approximation (or its posterior mean) which 
is conditioned on the evaluation points instead of the function 
itself. Here we assume that g is scalar for simplicity as we 
can always take a vector function elementwise. 

Gaussian process quadratures are related to a regression 
interpretation of classical Gaussian quadrature integration, that 
is, we can interpret these integration methods as orthogonal 
polynomial approximations of the integrand evaluated at cer¬ 
tain finite number of points 0. The integral is then approx¬ 
imated by integrating the polynomial instead of the original 
function. However, the aim of Gaussian process quadrature is 
to get a good performance in average, whereas in classical 
polynomial quadratures the integration rule is designed to be 
exact for a limited class of (polynomial) functions. Still, these 
approaches are very much linked together 0. 

Due to linearity of integration, the posterior mean of the 
integral of the Gaussian process regressor is given as 


E 


j gK{yi)w{x) dx I o = yE[pif(x) | o] w(x) dx, 


(24) 


where the “training set” o = (p(xi),...,p(x 7 v)) now con¬ 
tains the values of the function g evaluated at certain selected 
inputs. 

The posterior variance of the integral can be evaluated in an 
analogous manner, and it is sometimes used to optimize the 
evaluation points of the function gx 0-0- The posterior 
covariance of the approximation is 


Var 


gK{x)w{x) dx I o 


JJ Cov[gK{x) I o] u>(x) dxw(x') dx'. 


(25) 


That is, when we approximate the integral 
posterior mean we have 


with the 


p(x) w{x) dx : 


J lJ{x)w{x) dx 


(K + a"I)-%, 


(26) 


The posterior variance of the (scalar) integral is 


Var 


Pk(x) ^(x) dx I o 

JJ iT(x, x')w(x) dxu;(x') dx' 


J k^{x)w{x) dx 


(K + ct^I)- 


k(x') w(x') dx' 
(27) 


In this article we are specihcally interested in the case of 
Gaussian weight function, which then reduces the integral 
appearing in the above expressions ( [26] l and ( [27] i to 

J k'r(x) m;(x) dx 

It is now easy to see that when the covariance function is a 
squared exponential iT(x, Xi) = exp(—|jx—Xi|j^), 
this integral can be easily computed in closed form by using 
the computation rules for Gaussian distributions. Furthermore 
if the covariance function is a multivariate polynomial, then 
these integrals are given by the moments of the Gaussian 
distributions, which are also available in closed form. 


= / iT(x, Xi) N(x I m, P) dx (28) 


III. Gaussian process quadratures for sigma-point 

EILTERING AND SMOOTHING 

In this section we start by showing how Gaussian process 
quadratures (GPQ) can be seen as sigma-point methods and 
then introduce the Gaussian process transform. The Gaussian 
process transform then enables us to construct GPQ-based 
non-linear biters and smoothers analogously to GZl- 


A. GPQ as a sigma-point method 

In this section the aim is to shown how Gaussian process 
quadratures (GPQ) can be seen as sigma-point methods. 


Lemma III.l (GPQ as a sigma-point method). The Gaussian 
process quadrature (or Bayes-Hennite/Bayesian quadrature) 
can be seen is a sigma-point-type of integral approximation 

N 

J g(x) N(x I m, P) dx « ^ Wi g(xQ, (29) 

i—1 

where x^ = m -f ■\/P^i with the unit sigma-points are 
selected according to a predefined criterion, and the weights 
are determined by 


W, = 


(^J kT(ON(| |0,I) d^^ (K + a^I)-! 


(30) 


where K = is the matrix of unit sigma-point 

covariances and k(^) = is the vector cross co- 

variances. In principle, the choice of unit sigma-points above 
is completely free, but good choices of them are discussed in 
the following sections. 


Proof: Let us brst use stochastic decoupling ( [T4| which 
enables us to only consider unit-Gaussian integration formulas 
of the form GD- Because we can integrate vector functions 
element-by-element, without loss of generality we can assume 
that p(x) is single-dimensional. Let us now model the function 
^ 5 (m -f as a Gaussian process gx with a given 

covariance function and bx the training set for the 

GP regressor by selecting the points i = 1,..., N, which 
also determines the corresponding points x^ = m + Vp^* 
such that the training set is o = (^(xi),... ,g{xx)). The GP 
approximation to the integral now follows from ( |26l i: 

J p(m-fVPI) N(^|0,I) d| 


kT(0 N(^ |0,I) d| 


(K + a^Q-io, 


(31) 
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which when simplihed and applied to all the dimensions of g 
gives the result. ■ 

Note that above we actually assume that the stochastically- 
decoupled-function £ i—p(m + 4) instead of the original 

integrand p(x) has the given covariance function. The reason 
for this modeling choice is that it enables us to decouple the 
mean and covariance from the integration formula and hence is 
computationally benehcial. This also makes the result invariant 
to affine transformations of the state and it also has a property 
that the variability of the functions corresponds to the scale 
of the problem. However, on the other hand, one might argue 
that it is the function p(x) which should actually model and 
using the stochastically-decoupled-function is “wrong”. 

Remark III.l (Variance of GPQ). From Equation ( [ZT] ) we get 
that the component-wise variances of the Gaussian process 
quadrature approximation can be expressed as 


can even be sometimes used to compensate for inaccuracies 
in modeling. 


Example III.l (GPT with squared exponential kernel). Let us 
now consider ^ € K and select the sigma-point locations to be 
the ones of unscented transform With the squared expo¬ 
nential covariance function and noise-free measurements 
(cr^ =0) we then get the weights: 




W'o:2 = 


/P+T ( e^-l^ "" 


(k+1) 


2,2 (^^2+1) + 


» + l 

/l2 + \ ( 


( 2 ^ 2 + 3 ) („+i) 

, 2,2{i2j^C) I 2 (,2 + 1) _ 


v ,= JJ I 0,1) d|N(|' I 0,1) d|' 

- [ kT(|)N(C|0,I)d|(K + a2l)-i 


V 


,_ \ ^ 

/£2 + l ( e^2- _i 


I 0) I) interesting property is that in the limit f —>■ 00 w get 

(32) 


(35) 


Using the above integration approximations we can also 
dehne a general Gaussian process transform as follows. The 
reason for introducing the transform is that the corresponding 


lim Wo :2 = 

^—>•00 


/ 


2{k+1) 
\ 2(k+1) 


(36) 


approximate biters and smoothers can be readily constructed which are the unscented transform weights. We return to this 
in terms of the transform (cf |[T7|), which we will do in the relationship in Section \IV-D\ 


next section. 


Algorithm III.l (Gaussian process transform). The Gaussian 
process quadrature based Gaussian approximation to the joint 
distribution of x and the transformed random variable y = 
g(x) + q, where x ~ N(m, P) and q ~ N(0, Q), is given by 






(33) 


where 


Xi = m + 

N 

/^GP = 

N 

Sgp = “ ^^GP) (g(Xi) - Mgp)^ + Q: 

N 

Cgp (xi - m) (g(x,) - 

2=1 


(34) 


where is some fixed set of sigma/training points and 
the weights are given by Equation with some selected 
covariance function ). 


In this article, at least in the analytical results, we usually 
assume that the measurements are noise-free, that is, = 
0. This enables us to obtain analytically exact relationships 
with the classical quadrature methods. However, when using 
Gaussian process quadratures as numerical integration method, 
it is often benehcial to have at least a small non-zero value 
for cr^ in (|30|. This kind of “jitter” stabilizes numerics and 


B. GPQs in filtering and smoothing 

In this section we show how to construct biters and 
smoothers using the Gaussian process quadrature approxima¬ 
tions. Because Algorithm III.l can be seen as a sigma-point 
method, analogously to other sigma-point biters considered, 
for example, in we can now formulate the following 

sigma-point biter for model Q, which uses the the unit sigma- 


points and weights Wi debned by Algorithm III.l 


Algorithm III.2 (Gaussian process quadrature biter). The fil¬ 
tering is started from initial mean and covariance, mo and Pq, 
respectively, such that Xq ~ N(mo,Po)- Then the following 
prediction and update steps are applied for k = 1, 2, 3,..., T. 
Prediction: 

(i) 

1) Fonn the sigma points as follows: = xn-k-i + 

2) Propagate the sigma points through the dynamic model: 

3) Compute the predicted mean m^, and the predicted 
covariance P^.' 


N 




2=1 

N 




(i) 


^k) + Qk-1- 


Update: 
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1) Form the sigma points: Xf, + \JPf, i = 

2) Propagate sigma points through the measurement 

model: = h(A’jr^*^), * = 1... 

3) Compute the predicted mean Pf., the predicted covari¬ 
ance of the measurement Sfc, and the cross-covariance 
of the state and the measurement Ci~: 

N 

N 

Sfc = ^ - Mfc) + Rfc, 

N 

Cfe = ^ m (A’-W - 

i=l 

4) Compute the filter gain and the filtered state mean 
mfc and covariance Pfc, conditional on the measurement 

yc 


can also be derived analogously as was done in the same 
reference. 

IV. Selection of covariance functions and 
SIGMA-POINT locations 

The accuracy of the Gaussian process quadrature method 
and hence the accuracy of the hltering and smoothing methods 
using it is affected by 

1) the covariance function used and 

2) the sigma-point locations 

Once both of the above are fixed, the weights are determined 
by Equation ( |30l l. In this section we discuss certain useful 
choices of covariance functions as well as ’’optimal” choices of 
sigma-point locations for them. We also discuss the connection 
of the resulting methods with sigma-point methods such as 
unscented transforms and Gauss-Hermite quadratures. 

A. Squared exponential and minimum variance point sets 


Kfc = CfcSfc^ 

nik=mf -\- Kfc [yk - Pk\, 

Pfc = Pfe -KfeSfeKj. 

Further following the line of thought in GZ) we can formu¬ 
late a sigma-point smoother using the unit sigma-points and 
weights from Algorithm |III.1| 

Algorithm III.3 (Gaussian process quadrature sigma-point 
RTS smoother). The smoothing recursion is started from 
the filtering result of the last time step k = T, that is, 
= rriT’, P|, = Pj’ and proceeded backwards for 
fc = T—1,T — 2,...,1 as follows. 

1) Form the sigma points: X^ = m.k -f yPk^ifi = 

2) Propagate the sigma points through the dynamic model: 

xlfl,=^{xlp),^ = f...,N. 

3) Compute the predicted mean 111 ^,^^^, the predicted co- 

variance and the cross-covariance Dfc+i.' 


N 

N 


fe+1’ 


^fc+l ~ ^ i^k + 1 (^k+1 


i=l 

N 


Dfc+i = ^ TV. - mfc) (Xl% - 


4) Compute the gain Gk, mean and covariance P| as 
follows: 


In a machine learning context |[7j the default choice for 
a covariance function of a Gaussian process is the squared 
exponential covariance function in Equation ( |20l i. What makes 
it convenient in Gaussian process quadrature context is that the 
integral required for computing the weights in Equation ( |30| ) 
can be evaluated in closed form (cf. Q, ifTSl). It turns out 
that the posterior variance can be computed in closed form as 
well which is useful because for a given set of sigma-points 
we can immediately compute the expected error in the integral 
approximation (assuming that the integrand is indeed a GP) - 
this is possible because the variance does not depend on the 
observations at all. 

One way to determine the sigma-point locations is to select 
them to minimize the posterior variance of the integral approx¬ 
imation In our case this corresponds to minization of 

the variance in Equation ([32]) with respect to the points 
Although the minimization is not possible in closed form, 
with a moderate N this optimization can be done numerically. 
Figure [T] shows examples of minimum variance point sets 
optimized by using the Broyden-Fletcher-Goldfarb-Shanno 
(BEGS) algorithm | |35| . 

The squared exponential covariance function is not the 
only possible choice for a covariance function. From the 
machine learning context we could, for example, choose a 
Matern covariance function or some of the scale-mixture-based 
covariance functions ||7). In that case the weight integral ( |30| ) 
becomes less trivial, but at least we always have a chance 
to precompute the weights using some (other) multivariate 
quadrature method. The sigma-point optimization could also 
be done similarly as for the squared exponential covariance 
function. 


Gk — Dfc+i , 

ml = mk + Gk (nifc+i - 

Pl = Pk + Gk {PU,-Pf^,)Gl. 

Note that we could cope with non-additive noises in the 
model by using augmented forms of the above hlters and 
smoothers as in GZl- The hxed-point and hxed-lag smoothers 


B. UT and spherical cubature rules 

In addition to the squared exponential covariance funtion, 
another useful class of covariance function are polynomial 
covariance functions. They correspond to linear-in-parameters 
regression using polynomials as the regressor functions. It 
turns out that also for polynomial covariance functions we can 
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Fig. 1. Minimum vaiiance point sets for the squared exponential covariance 
function. 


compute the weights ( [30l l in closed form. What is even more 
interesting is that the Gaussian process quadratures reduce to 
classical numerical integration methods. In this section we 
show that with certain selections of symmetric evaluation 
points we get a classical family of spherically symmetric 
integration methods of McNamee and Stenger m of which 
the unscented transform 0 can be (retrospectively) seen 
as a special case GD- More detailed information on the 
multivariate Hermite polynomials used below can be found 
in Appendix 

Theorem IV.l (UT covariance function). Assume that 

H H H 

9=0|J|=gp=0|/|=p 

(37) 

where \x.j ’s form a positive definite covariance matrix 
and Hx{^) are multivariate Hermite polynomials (see Ap¬ 
pendix 0- If we now select the evaluation points as in UT 
GZl. then the GPQ weights Wi become the UT weights. Fur¬ 
thermore, the posterior variance of the integral approximation 
is exactly zero. 

Proof: The prior gx ~ GP(0, with the above 

covariance is equivalent to a parametric model of the form 

3 1 

5i<r(^)=5I 51 (38) 

P=0 |I|=p 


where cx are zero mean Gaussian random variables with the 
covariances Xx,j = E [ci cj]. When the joint covariance 
matrix A = [\x,j\ is non-singular, the posterior covariance 
of the integral being zero is equivalent to that the integral 
rule is exact for all functions of the form ( |38| with arbitrary 
coefficients. Clearly with the UT evaluations points, the UT 



weights are the unique ones that have this property (see, e.g., 
i fTTl ) and hence the result follows. ■ 

Note that the above result also covers the cubature transform 
(CT), that is, the moment matching rule used in the cubature 
Kalman filter (CKF) and the smoother, because the transform 
is a special case of UT fTTI . 

Theorem IV.2 (Higher order UT covariance function). Assume 
that 




1 


9=0|J|=gp=0|/|=p 


Xx,j Hx{i) Hj{^) 


(39) 

If we select the evaluation points according to order P = 
5, 7, 9,... rules in we obtain the higher order integration 
formulas in which are often referred to as fifth order, 

seventh order, ninth order and higher order UTs. 

Proof: The result follows analogously to the 3rd order 
case above. ■ 


Example IV.l (Derivation of UT weights from GPQ). Let 
^ S and consider the GPQ with UT sigma-points and 
the covariance function 0. With a = 0 and Xx,j = ^x.j 
we then obtain the covariance matrix in 0. It also turns out 
that 

y'kT(|)N(^ |0,I)d|=(l ••• 1) (41) 

and finally 

TI/ _ f 1 1 t 1 ^ 

rr0:4 — I k+2 2(k-|-2) 2 (k-|-2) 2 (k-|-2) 2 (k-|-2) J ! 

(42) 

which are indeed the UT weights. 


C. Multivariate Gauss-Hermite point sets 

The multivariate Gauss-Hermite point sets (see, e.g., 1IZ)> 
1211) of order P are exact for monomials of of the form xf X 
• • • X where pi < 2P — 1 for i = 1,..., n. This implies 
the following covariance function class. 


Theorem IV.3 (Gauss-Hermite covariance function). Assume 
that 

K{Li')= E E 

max J'<2P— 1 maxX<2P— 1 

( 43 ) 
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(40) 
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(a) GH-4 


Fig. 3. Gauss-Hermite point sets. 
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(b) GH-5 


o 

o 

o 
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where \x.j 's form a positive definite covariance matrix and 
Hx{^) are multivariate Hermite polynomials. If we now select 
the evaluation points to form a cartesian product of roots of 
Hermite polynomials of order P, then the GPQ weights Wi 
become the multivariate Gauss-Hermite quadrature weights. 
The posterior variance of the integral approximation is again 
exactly zero. 

Proof: Again the result follows from the equivalence 
of the polynomial approximations and polynomial covariance 
functions together with the uniqueness of the Gauss-Hermite 
rule for exact integration of this same function class. ■ 
Even when we are using polynomial covariance functions, 
we are by no means restricted to using the specific points 
sets corresponding to the classical integration rules. However, 
obviously, given the order of the polynomial kernel and 
number of sigma-points they are also minimum variance points 
sets and hence good choices also in average - provided that 
the integrand is indeed a polynomial. In any case, for an 
arbitrary set of sigma-points we can use Equation ( |30| to give 
the corresponding minimum variance weights. 


D. Connection between squared exponential and polynomial 
Gaussian process quadratures 

As discussed in the Gaussian process quadrature with 
squared exponential covariance function also has a strong 
connection with classical quadrature methods. This is because 
we can consider a set of damped polynomial basis functions 
of the form = x'' exp(—x^/(2f^)), which at least 

informally speaking can be seen to converge to a polynomial 
basis when £ ^ oo. We can now consider a family of random 
functions (Gaussian processes) of the form 


ge{x) = ^ Cj 4>k{x) = ^ Cj x^ exp 

3 3 



(44) 


where Cj ^ N(0, ^). The covariance function of this 

class is now given as 



which is the squared exponential covariance function. 

Based on the above, Minka Q argued (although did not for¬ 
mally prove) that GPQs with the squared exponential covari¬ 
ance functions should converge to the classical quadratures. 
This argument is indeed backed up by our analytical example 
in Example |III. 1 1 where this covergence indeed happens. 


E. Random and quasi-random point sets 

Recall that one way to approximate the expectation of 
g(^) over a Gaussian distribution N(0,I) is to use Monte 
Carlo integration. In that method we simply draw N samples 
from the Gaussian distribution ~ N(0,I) and use them 
as sigma-points. The classical Monte Carlo approximation to 
the integral would now correspond to setting Wi = \/N. 
Alternatively, we could use these random points as sigma- 
points and evaluate their weights by Equation ( [30l l. This leads 
to an approximation, which is sometimes called the Bayesian 
Monte Carlo approximation |36), 

Instead of sampling from the normal distribution, we can 
also use quasi-random points sets such as the Hammersley 
point sets |38|, |39|. These are points sets which are designed 
to give a smaller error in average than random points. The 
classical method would correspond to setting all weights to 
Wi = 1/N, but again, we can also use Equation ( |30l l to 
evaluate the weights for the GP quadrature. This corresponds 
to a ’’Bayesian quasi Monte Carlo” approximation to the 
integral. Some examples of Hammersley point sets are shown 
in Eigure 


V. Numerical Results 


A. Covariance functions and regression implied by unscented 
transform 


The unscented transform covariance functions of orders 3-7 
(see Theorems IV. 1 and |IV.2| l and the exponentiated quadratic 
(i.e., the squared exponential, SE) covariance function (Eq. 
( |20| )) are illustrated in Eig. The polynomial nature of the 
unscented transform (UT) covariance function can be clearly 
seen in the figures - the UT covariance function as such does 
not have such a simple local-correlation-interpretation as the 
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Fig. 4. Hammersley point sets. 



(a) UT-3 


(b) UT-5 



(c) UT-7 (d) SE 

Fig. 6. Regression with covariance functions for UT and SE. 



certainly grows with the polynomial (and thus UT) order. 

B. Illustrative high-dimensional example 

We use the same test case as in Section VIII.A. of 1^ , that 
is, the computation of the first two moments of the function 
y(x) = (\/l + for p = 1, —2, —3, —5. We thus aim to 

approximate the following integrals; 

E[t/(x)] = f (^\/l + x^x^ N(x|m, P)dx, (46) 



(c) UT-7 (d) SE 

Fig. 5. Covariance functions corresponding to different orders of unscented 
transforms (UT) and the squared exponential (SE) covariance function (s = 1, 
£ = 1 /2) for a single-input scalar-valued Gaussian process. 


SE covariance function has as the UT covariance functions 
simply blow up polynomially when moving away from the 
diagonal. 

The corresponding Gaussian process regression results on 
random data are illustrated in Fig. The polynomial nature 
of the unscented transform can be clearly seen in the figures. 
The Gaussian process prediction with the unscented transform 
covariance function has a clear polynomial shape as expected. 
Clearly the polynomial fit has less flexibility to explain the 
data than the exponentiated quadratic fit although the flexibility 


E[y2(x)] = J (l-fx'^x)^ N(x I m,P) dx. (47) 

Figure |7] shows the result of using the following methods as 
function of the state-dimensionality: 

• Cubature: The 3rd order spherical cubature sigma-points 
(2n points) with the standard integration weights. 

• GPQ-Cubature: The Gaussian process quadrature with 
SE covariance function and the 3rd spherical cubature 
sigma-points above. 

• GPQ-Hammers ley: The Gaussian process quadrature with 
SE covariance and 2n Hammersley points. 

The 3rd spherical cubature points refer to the integration rule 
proposed in pT) , which was also used in the cubature Kalman 
filter (CKF) in ]24|. In the rule, the sigma-points of placed to 
the intersections coordinate axes with the origin-centered n- 
dimensional hypersphere of radius y/n. 

The results in Figure]^ show that the GPQ quite consistently 
gives a bit lower KL-divergence and hence better result than 
the plain cubature when the cubature points are used. When 
Hammersley point sets are used, the results vary a bit more; 
with small state dimensions the results are slightly worse than 
with the cubature points. When p ^ 1, the Hammersley results 
are much better in high dimensions whereas with p = 1 the 
results are worse than with the cubature point sets. 
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(e) GPQ-Cubature for EZl 



Fig. 7. Compaiison of different methods in computing the moment integrals used in (24| for illustrating the performance of the cuhature rule. It can be seen 
that the GPQ methods outperform the cubature rule in most of the cases. 


C. Univariate non-linear growth model 

In this section we compare the performance of the different 
methods in the following univariate non-linear growth model 
(UNGM) which is often used for benchmarking non-linear 
estimation methods; 

Xk = \xk-i + 25 -f 8 cos(1.2A:) + qu-i, 

2 (48) 

Vk = ^^k+Xk, 

where xq ^ N(0, 5), qk-i ~ N(0,10), and ~ N(0,1). 

We generated 100 independent datesets with 500 time 
step each and applied the following methods to it: extended, 
unscented (k — 2), and cubature filters and smoothers 
(EKF/UKF/CKF/ERTS/URTS/CRTS); Gauss-Hermite filters 
and smoothers with 3, 7, and 10 points (GHKF/GHRTS); 
Gaussian process quadrature filter and smoother with un¬ 
scented transform points (GPKFU/GPRTSU) and cubature 
points (GPKFC/GPRTSC); with Hammersley point sets of 
sizes 3, 7, and 10 (GPKFH/GPRTSH); and with minimum 
variance points sets of sizes 3, 7, and 10 (GPKFO/GPRTSO). 
The covariance function was the exponentiated quadratic with 
s = 1 and £ = 3 and the noise variance was set to 10“®. The 
RMSF results together with single standard derivation bars are 
shown in Figures and [9] As can be seen in the figures, with 
5 and 10 points the Gaussian process quadrature based filters 
and smoothers have significantly lower errors than almost all 
the other methods - only Gauss-Hermite with 10 points and 
the cubature RTS smoother come close. 


Fig. 8. 



RMSE results of filters in the UNGM experiment. 


D. Bearings only target tracking 


In this section we evaluate the methods in the bearings 
only target tracking problem with a coordinated-turn dynamic 
model, which was also considered in Section III.A of the 
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Fig. 9. RMSE results of smoothers in the UNGM experiment. 


Fig. 10. Position RMSE results of filters in the healings only experiment. 


article p2). The non-linear dynamic model is 


Xfe = 


/l 

0 

0 

0 

Vo 


sin(cjfc At) 
to 

cos{ojk At) 

l—COs{LOk At) 

sin(a;fe At) 
0 


^ 1—cos(i.jfc At) ^ 

— sin(w/c At) 

sin{LoAt) 

UJ 

cos{ujk At) 

0 


o\ 

0 

0 

0 

1/ 


Xfc_i+qfe_i, 


(49) 


where the state of the target is x = (xi,ii,a; 2 ,± 2 , w), and 
a;i,a ;2 are the coordinates and xi,X 2 are the velocities in two 
dimensional space. The time step size is set to At = 1 s and 
the covariance of the process noise qk ^ A(0, Q) is 

\ 



/<7i^ 

r, -^‘2 

91 — 

0 

0 



qiAt 

0 

0 
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0 

0 <71 

At3 

3 

„ At2 

91 — 
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0 9i 

At2 

2 

qiAt 


V 0 

0 

0 

0 

we 

used Qi 

= 0.1m2s- 

and 

<72 = 1- 


0 
0 
0 
0 

<72 At/ 


(50) 


1 -^- 3 . 


In the simulation setup we have four sensors measuring the 
angles 9 between the target and the sensors. The non-linear 
measurement model for sensor i can be written as 








& & & & & 


Fig. 11. Position RMSE results of smoothers in the bearings only experiment. 


VI. Conclusion 


0* = arctan ( — -^ ) + r*, (51) 

Va:i - s\J 

where (s^, s|) is the position of the sensor i in two dimensions, 
and r* ^ N{0,ag) is the measurement noise. The used 
parameters were the same as in the article p2) . 

The RMSE results for the position errors are shown in 
Figures [T^ and [TT] Clearly all of the sigma-point methods 
outperform the Taylor series based methods (EKF/EKS). How¬ 
ever, the performances of all the sigma-point methods are very 
similar: also the Gaussian process quadrature methods give 
very similar results to the other sigma-point methods. There is 
a small dip in the errors at the Gauss-Hermite based methods 
as well as in the highest order Hammersley GPQ method, but 
practically the performance of all the sigma-point methods is 
the same. 


In this article we have proposed new Gaussian process 
quadrature based non-linear Kalman filtering and smoothing 
methods and analyzed their relationship with other sigma-point 
filters and smoothers. We have also discussed the selection 
of the evaluation points for the quadratures with respect to 
different criteria: exactness for multivariate polynomials up 
to a given order, minimum average error, and quasi-random 
point sets. We have shown that with suitable selections of 
(polynomial) covariance functions for the Gaussian processes 
the filters and smoothers reduce to unscented Kalman filters 
of different orders as well as to Gauss-Hermite Kalman 
filters and smoothers. By numerical experiments we have also 
shown that the Gaussian process quadrature rules as well 
as the corresponding filters and smoothers often outperform 
previously proposed (polynomial) integration rules and sigma- 
point filters and smoothers. 
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Appendix A 

Fourier-Hermite series 

Fourier-Hermite series (see, e.g., | [40) ) are orthogonal poly¬ 
nomial series in a Hilbert space, where the inner product 
is defined via an expectation of a product over a Gaussian 
distributions. These series are also inherently related to non¬ 
linear Gaussian filtering as they can be seen as generalizations 
of statistical linearization and they also have a deep connection 
with unscented transforms, Gaussian quadrature integration, 
and Gaussian process regression 03,03’ ©• 

We can define an inner product of multivariate scalar 
functions /(x) and p(x) as follows; 

(/, 9 ) = J /(x) 5(x) N(x I 0, 1 ) dx. (52) 

If we now define a norm via ||/||^ = (/,/), and the 
corresponding distance function d{f,g) = ||/ —then the 
functions WfWu < 00 form a Hilbert space H. It now turns 
out that the multivariate Hermite polynomials form a complete 
orthogonal basis of the resulting Hilbert space | |40) . 

A multivariate Hermite polynomial with multi-index I = 
{ii,..., in} can be defined as 


Hx{x.) = Hi^{xi) X ■ ■ ■ X Hi^{xn) (53) 


which is a product of univariate Hermite polynomials 

dP 

Hp{x) = {-ly exp(a:^/2) — exp(-a;^/2). (54) 

The orthogonality property can now be expressed as 




J!, ifJ = J 
0, otherwise, 


(55) 


where we have denoted I! = ii! • • • i„! and X = J means that 
each of the elements in the multi-indices I = {fi,..., and 
J = {ji, ■ ■ ■ ,jn} are equal. We will also denote the sum of 
indices as |I| = 

A function p(x) with {g,g) < 00 can be expanded into 
Fourier-Hermite series | |40l 

00 I 

ffW = (56) 

p=0 \I\=p 


where iTi(x) are multivariate Hermite polynomials and the se¬ 
ries coefficients are given by the inner products cx = {Hx, g)- 
Consider a Gaussian process gai^) which has zero mean 
and a covariance function K{x, x'). In the same way as deter¬ 
ministic functions, Gaussian processes can also be expanded 
into Fourier-Hermite series: 


5g(x) = Y^ YfxHxiyi), (57) 

P=0 \I\=p 


where the coefficients are given as cx = {Hi, go)- The 
coefficients cx are zero mean Gaussian random variables and 
their covariance is given as 

E [cx cj] = E [{Hx, go) {Hj,gG)] 

= JJ Hx{^)K{x,x')Hj{x.') (58) 

X N(x I 0,1) N(x' I 0,1) dx dx'. 


If we define constants Xx,j — E[cxcj] then the covariance 
function Ar(x,x') can be further written as series 


LXJ LXJ ^ 

^(x,x')=^ II II II ^Ax,jF7x(x)i7j(x'). 

-n I xri . _n I'Ti ^ ' 
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